Directed acyclic graphs are the brain child of computer scientist Judea Pearl, who developed them to reason systemically about causal inference. Judea Pearl’s The Book of Why provides an accessible introduction to directed acyclic graphs and makes for a nice holiday reading.
Let’s assume we want to generate a data set with the following variables:
Before we start simulating anything, we might want to think about the possible causal relationship between these variables. Drawing graphs is a good way to do this. Here are some possible scenarios:
library(dagitty)
par(mfrow = c(1,3))
dag1 = dagitty(
"dag{
D->B;
B->C;
D->C
}")
ccord.list =
list(
x=c(D=0,B=2,C=1),
y=c(D=0,B=0,C=1))
coordinates(dag1) = ccord.list
drawdag(dag1, cex = 2)
dag2 = dagitty(
"dag{
D->B;
D->C
}")
coordinates(dag2) = ccord.list
drawdag(dag2, cex = 2)
dag3 = dagitty(
"dag{
D->B;
B->C
}")
coordinates(dag3) = ccord.list
drawdag(dag3, cex = 2)
These graphs are build from two types of components:
All graphs are made of nodes and edges. What is special about directed acyclic graphs is that
To simulate data from a DAG, we have to decide what distributions the variables have. To keep things simple, we just use normally distributed data for now. The order in which we simulate variables is determined by the DAG. The first thing we do is to check which variables are exogenous to our model (DAG), because these have to be modeled first.
Let us focus on the left DAG, which we show here again:
drawdag(dag1, cex = 0.5, lwd = 1)
Here, \(D\) is the exogenous variable, so we simulate it first.
set.seed(12345)
N = 250
D = rnorm(N)
\(\small B\) and \(\small C\) are both endogenous variables, they are both determined by other variables in our model/DAG, but \(\small B\) is a parent of \(\small C\), so we have to simulate \(\small B\) next:
B = 1*D + rnorm(N)
Finally, we can calculate \(\small C\) as an effect of \(\small D\) and \(\small B\)
C = 1*D + 0.2*(B) + rnorm(N)
As a recap, here is the model for a simple linear regression that models \(\small C\) as a function of \(\small B\).
| What | Notation | quap R-code |
|---|---|---|
| Likelihood | \(C_i \sim Normal(\mu,\sigma)\) | C ~ dnorm(mu, sigma) |
| Linear model | \(\mu_i = \alpha + \beta B_i\) | mu[i] <- a + b*B[i] |
| Prior | \(\alpha \sim Normal(0,.5)\) | alpha ~ dnorm(0, .5) |
| Prior | \(\beta \sim Normal(0,2)\) | beta ~ dnorm(0, 1) |
| Prior | \(\sigma \sim Exponential(1)\) | sigma ~ dunif(1) |
Here we put the data together and estimate the model with
quap:
coefkable = function(m,caption) {
cap = paste("Coeffcients for model",caption)
m %>%
precis %>%
as.matrix() %>%
kable(caption = cap, digits = 2) %>%
kable_styling(full_width = F)
}
d = data.frame(C = C, B = B, D = D)
m.C_B <-quap(
alist(
C ~ dnorm(mu,sigma),
mu <- a + bB*B,
a ~ dnorm(0,.5),
bB ~ dnorm(0,1),
sigma ~ dexp(1)),
data=d)
# un-hide previous code chunk for the coefkable function
coefkable(m.C_B, "m.C_B")
| mean | sd | 5.5% | 94.5% | |
|---|---|---|---|---|
| a | 0.08 | 0.08 | -0.04 | 0.21 |
| bB | 0.73 | 0.05 | 0.64 | 0.82 |
| sigma | 1.24 | 0.06 | 1.15 | 1.33 |
And we plot the associations implied by prior and posterior:
par(mfrow = c(1,2))
prior = extract.prior(m.C_B)
mu = link(m.C_B,
post=prior,
data=list(B=c(-3,3)))
matplot(c(-3,3),
t(mu[1:50,]),
xlab = "B", ylab = "C", ylim = c(-4.5,4.5),
type = "l", col = "blue", lty = 1)
title("Prior predictive samples")
B_seq = seq(from=-3,to=3,length.out=30)
mu = link(m.C_B,data=list(B=B_seq))
mu.mean <-apply(mu,2,mean)
mu.HPDI <-apply(mu,2,HPDI)
plot( C~B,data=d, cex = .5,
col=adjustcolor("black", alpha = .2),
ylim = c(-4.5,4.5),
xlim = c(-3,3))
lines( B_seq,mu.mean,lwd=2, col = "blue")
shade( mu.HPDI, B_seq, col = adjustcolor("blue", alpha = .2))
title("Posterior predictive mean & HDPI")
So, if we disregard our knowledge about daily reading duration of parents, we conclude that there is a strong relationship between number of books in a household and child reading ability. If one would be willing to interpret the association as causal, one could even think that number of books in a household are an important cause of child reading ability.
This example makes clear that we can come to wrong conclusions if our analysis model, here the simple linear regression \(\small C \sim Normal(\alpha+\beta B,\sigma)\) (the right DAG in the first figure), does not match the data generating process or model, which is \(\small C \sim Normal(\alpha+\beta_1 B + \beta_2 D,\sigma)\) (the left DAG in the first figure).
The question then is, how can we figure out what the correct data generating process (true DAG) is? Domain knowledge is obviously useful. In addition one can also use the fact that DAGs (can) imply conditional independencies. Two variables are independent conditional on a third variables when the first two variables are unrelated given that one knows the value of the third variable or fixes it to a specific value.
Let’s look at the left and right DAGs from above again:
par(mfrow = c(1,2))
drawdag(dag1, cex = 0.5, lwd = 1)
drawdag(dag3, cex = 0.5, lwd = 1)
For the left DAG:
So if the left DAG is correct, \(\small D\) and \(\small C\) are still related after we account for the relationship of \(\small B\) and \(\small C\).
Onto the right DAG:
The DAG to the right has an implied conditional independence that we can check with our data.
Determining conditional dependencies gets harder when DAGs are more
complex. In this case one can use software like the R package
dagitty to determine conditional independencies as
follows:
library(dagitty)
impliedConditionalIndependencies(dag1)
impliedConditionalIndependencies(dag2)
## B _||_ C | D
here B _||_ C | D means that \(\small B\) is independent of \(\small C\) conditional on \(\small D\), which you can also find written
as \((\small B \perp\!\!\!\perp \small C \mid
\small D)\).
impliedConditionalIndependencies(dag3)
## C _||_ D | B
Now that we have learned that under some DGPs there should be conditional independcies, how can we test them?
Remember that we defined conditional independence as independence given that we already know the third variable. One way to incorporate what we already know about a third variable is to use multiple (linear) regression.
Assuming that \(\small D\) is our
outcome of interest, we can check if, in a model that uses both \(\small D\) and \(\small B\) to predict \(\small C\), \(\small B\) remains associated with \(\small C\) when we also account for the
effect of \(\small D\). If it does,
this speaks in favour of dag1. If it does not, it speaks in
favour of dag3.
To go from a simple to a multiple linear regression, we simply add a predictor variable to the linear predictor:
| What | Notation | quap R-code |
|---|---|---|
| Simple linear model | \(\mu_i = \alpha + \beta_B B_i\) | mu[i] <- a + bB*B[i] |
| Multiple linear model | \(\mu_i = \alpha + \beta_B B_i + \beta_D D_i\) | mu[i] <- a + bB*B[i] + bD*D[i] |
Accordingly, the quap model for the multiple linear
regression looks a lot like the one for simple linear regression.
m.C_BD <-quap(
alist(
C ~ dnorm(mu,sigma),
mu <- a + bB*B + bD*D,
a ~ dnorm(0,0.2),
bB ~ dnorm(0,1),
bD ~ dnorm(0,1),
sigma ~ dexp(1)),
data=d)
coefkable(m.C_BD,"m.C_BD")
| mean | sd | 5.5% | 94.5% | |
|---|---|---|---|---|
| a | 0.04 | 0.06 | -0.06 | 0.15 |
| bB | 0.19 | 0.07 | 0.07 | 0.30 |
| bD | 0.97 | 0.10 | 0.81 | 1.13 |
| sigma | 1.06 | 0.05 | 0.98 | 1.13 |
Looking at the quap model, you can see that we use the
same parameters for the priors for bD and bB.
This makes sense because both variables have the same scale
(variability). We might want to change this if the variables have
different scales / variability.
Because this is a fairly simple model and less model checking is
needed, we directly look at the regression coefficients from the first
model m.C_B, which used only \(\small B\) as a predictor, and on the last
model we estimated, m.C_BD,which used \(\small B\) and \(\small D\).
plot(coeftab(m.C_BD,m.C_B),par=c("bD","bB"), xlim = c(0,1.2))
First, we check the conditional independence that would justify to
only use \(\small B\) as a predictor,
namely that \(\small \beta_D\) or
bD is (close to) zero in the coefficient from the
m.C_BD model. The figure shows that this is clearly not the
case and that therefore the m.C_B is not the correct model
to estimate the effect of \(\small B\).
Instead, we must use the m.C_BD model.
The other thing we can see is that the estimated association between
number of books in the household and is weaker in the model
m.C_BD, which uses an analysis consistent with the correct
data generating process. This is an example of confounding
bias, which is a more general term of which spurious association is
a special case. If we omit a common cause of our predictor and
outcome from the analysis, this leads to a biased estimation of the
association between predictor and outcome. Also note that the regression
parameters we estimated with the m.C_BD model are
consistent with the values we use to generate the data above.
Did we now successfully estimate the effect of daily reading hours on child reading?
To answer this question, we first need to establish that we can estimate two types of effects
The m.C_BD model estimates the direct effect but not the
total effect, because the coefficient for \(\small D\) is calculated while adjusting
for the indirect effect via \(\small
B\). If we want to estimate the total effect of \(\small D\), we have to model \(\small C\) only as an effect of \(\small D\).2
Here is the regression model:
m.C_D <-quap(
alist(
C ~ dnorm(mu,sigma),
mu <- a + bD*D,
a ~ dnorm(0,.5),
bD ~ dnorm(0,1),
sigma ~ dexp(1)),
data=d)
coefkable(m.C_D,"m.C_D")
| mean | sd | 5.5% | 94.5% | |
|---|---|---|---|---|
| a | 0.05 | 0.07 | -0.05 | 0.16 |
| bD | 1.17 | 0.06 | 1.07 | 1.27 |
| sigma | 1.07 | 0.05 | 1.00 | 1.15 |
Now we can plot the coefficients from the m.C_D and
m.C_BD models together:
plot(coeftab(m.C_BD,m.C_D),par=c("bD"))
As expected, the total effect from the m.C_D
model is larger than the direct effect from the
m.C_BD model.
One important consequence of thinking systematically about the processes that produced the data, and representing and analysing them with DAGs, is that what variables belong into a regression analysis depends on two things:
Depending on the answers to these questions, different variables have to be included into a regression.
One common mistake, which has the name Table 2 fallacy, where Table 2 is often the 2nd table in medical articles3, is to interpret multiple coefficients from a regression model as causal effects. This is in most cases not a valid approach.
Plotting against residuals tell us what multiple regression does.
We discussed earlier that in multiple regression each regression coefficient captures the effect of a variable conditional on that one accounts for the effect of all other variables.
This means that we only look at the effect of the variation in that variable that is independent from the variation of the other variables. If we look for example at the two variables \(\small B\) and \(\small D\), these are correlated:
plot(D,B)
If we run a regression like \(\small C \sim
Normal(\alpha + \beta B, \sigma)\) the coefficient \(\beta\) will capture the effect of the
total variation of \(\small B\), which
includes some variation due to \(\small
D\). This becomes obvious if you remember that we simulated
B = 1*D + rnorm(N).
One way to do a regression that returns the effect of \(\small B\) that is not due to \(\small D\), is to create a new variable \(\small B_R\) that does not include information that is also in \(\small D\). We can do this by regressing \(\small B \sim Normal(\alpha + \beta D, \sigma)\) and calculating \(\small B_R\) as the difference of the linear predictor (\(\small \mu_i\) or \(\small mu[i]\)) and \(\small B\).
m.B_D <-quap(
alist(
B ~ dnorm(mu,sigma),
mu <- a+bD*D,
a ~ dnorm(0,.5),
bD ~ dnorm(0,1),
sigma ~ dexp(1)),
data=d)
The following figure shows observed and predicted values on the left, where vertical lines from observed values to the regression line are residuals. The right hand side shows the residuals plotted again \(\small D\). As expected these are uncorrelated, which means that \(\small B_r\) does not include any information that is also in \(\small D\).
# calculate the linear predictor mu
# for all observed data points
mu = colMeans(link(m.B_D,data=d))
# mu to plot regression lines
mu.l = colMeans(link(m.B_D,data=list(D = c(-3,3))))
par(mfrow = c(1,2))
# select only a subset of values
# otherwise the plot gets very busy
idx = sample(nrow(d),75)
# plot raw data
plot(D[idx],B[idx], ylab = "B (estimated)", xlab = "D")
# plot regression line
lines(c(-3,3),mu.l, col = "blue")
# plot residual lines
for (i in idx) {
lines(rep(D[i],2), c(B[i],mu[i]), col = "blue")
}
# calculate residuals
Br = B - mu
# plot residuals of Br against predictor D
plot(D[idx],Br[idx], ylab = expression(B[r]), xlab = "D")
Here is an animation that shows how residuals capture only the variation in \(\small B\), after we have removed the effect of \(\small D\). In the animation, we do this by gradually reducing the slope and ending at a value of 0 for the slope, which indicates no influence of \(\small D\) on \(\small B\).
Understanding residuals
Now we can use \(\small Br\) as a predictor of \(\small C\) and plot the regression line:
d$Br = Br
m.C_Br <-quap(
alist(
C ~ dnorm(mu,sigma),
mu <- a+bBr*Br,
a ~ dnorm(0,.5),
bBr ~ dnorm(0,1),
sigma ~ dexp(1)),
data=d)
coefkable(m.C_Br,"m.C_Br")
| mean | sd | 5.5% | 94.5% | |
|---|---|---|---|---|
| a | 0.20 | 0.10 | 0.03 | 0.36 |
| bBr | 0.19 | 0.11 | 0.01 | 0.36 |
| sigma | 1.62 | 0.07 | 1.50 | 1.74 |
Here are the regression lines for the model with \(\small B_R\) as the predictor (left) and
with \(\small B\) as the predictor
(right). As expected, the relationship is much stronger for \(\small B\). Also compare the regression
coeffcients for \(\small B_R\) with the
regression coeffcient for the \(\small
B\) in the model m.C_BD. They are nearly
identical.
par(mfrow = c(1,2))
## Regression on residuals
Br_seq = seq(from=-3,to=3,length.out=30)
mu = link(m.C_Br,data=list(Br=Br_seq))
mu.mean <-apply(mu,2,mean)
mu.HPDI <-apply(mu,2,HPDI)
plot( C~Br,data=d, cex = .5,
col=adjustcolor("black", alpha = .2),
ylim = c(-4.5,4.5),
xlim = c(-3,3))
lines( B_seq,mu.mean,lwd=2, col = "blue")
shade( mu.HPDI, B_seq, col = adjustcolor("blue", alpha = .2))
## Regression on original B
B_seq = seq(from=-3,to=3,length.out=30)
mu = link(m.C_B,data=list(B=B_seq))
mu.mean <-apply(mu,2,mean)
mu.HPDI <-apply(mu,2,HPDI)
plot( C~B,data=d, cex = .5,
col=adjustcolor("black", alpha = .2),
ylim = c(-4.5,4.5),
xlim = c(-3,3))
lines( B_seq,mu.mean,lwd=2, col = "blue")
shade( mu.HPDI, B_seq, col = adjustcolor("blue", alpha = .2))
Posterior predictions tell us if the model is any good.
If our model capture the data well, the observed and model-predicted data should be correlated. We can visualize this correlation in a scatter plot. Because we do not expect that the predictions exactly coincide the observed values, it is useful to add credible intervals or HDPIs to the predicted values. Then one can check if those overlap with the regression line that visualizes perfect coincidence of observed and predicted data.
# no newdata, so uses original data
mu = link(m.C_BD)
# summarize samples across cases
mu_mean = apply(mu,2,mean)
mu_PI = apply(mu,2,PI) # credible intervals
# simulate observations
# again no newdata, so uses original data
C_sim = rethinking::sim(m.C_BD,n=1e3)
C_PI = apply(C_sim,2,PI) # prediction intervals
When we add HDPIs or credible intervals, we can calculate those from the linear predictions, the right plot in the next figure, or from posterior predictions, the left plot in the next figure.
If the main concern is if the observed data are within the most plausible predicted data, one should use credible intervals from posterior predictions. If one has for example a 90% credible interval, 90% of the observed values should be within the credible interval of the predicted values.
If the main concern is to detect systematic or strong deviations between observed and predicted data, these can be easier seen if one uses credible intervals from the linear predictions.
par(mfrow = c(1,2))
## posterior predictions + prediction intervals
plot(mu_mean[idx]~-d$C[idx],
col="blue", cex = .5,
xlab="Observed child reading",
ylab="Predicted child reading",
ylim = range(C_PI[,idx]))
# regression line for perfect coincidence
# of observed and predicted values
abline( a=0,b=1,lty=2)
# add credible intervals to predictions
for (i in idx)
lines(rep(d$C[i],2),
C_PI[,i],
col="blue")
title("posterior predictions + prediction intervals")
## posterior predictions + credible intervals
plot(mu_mean[idx]~-d$C[idx],
col="blue", cex = .5,
xlab="Observed child reading",
ylab="Predicted child reading",
ylim = range(C_PI[,idx]))
# regression line for perfect coincidence
# of observed and predicted values
abline( a=0,b=1,lty=2)
# add credible intervals to predictions
for (i in idx)
lines(rep(d$C[i],2),
mu_PI[,i],
col="blue")
title("posterior predictions + credible intervals")
In this example, there are no strong deviations between predicted and observed values. After all, the regression is based on the true model. Here are the same types of plots if the true DGP is a non-linear effect \(\small D\) on \(\small C\), but our model assumes only a linear relationship:
d2 = d
d2$C = d2$D - .2* d2$C^2 + .2*d2$B + rnorm(N)
m.C_BD2 <-quap(
alist(
C ~ dnorm(mu,sigma),
mu <- a + bD*D + bB*B,
a ~ dnorm(0,0.2),
bD ~ dnorm(0,1),
bB ~ dnorm(0,1),
sigma ~ dexp(1)),
data=d2)
mu = link(m.C_BD2)
mu_mean = apply(mu,2,mean)
mu_PI = apply(mu,2,PI)
C_sim = rethinking::sim(m.C_BD2,n=1e3)
C_PI = apply(C_sim,2,PI)
par(mfrow = c(1,2))
## posterior predictions + prediction intervals
plot(mu_mean[idx]~-d2$C[idx],
col="blue", cex = .5,
xlab="Observed child reading",
ylab="Predicted child reading",
ylim = range(C_PI[,idx]))
# regression line for perfect coincidence
# of observed and predicted values
abline( a=0,b=1,lty=2)
# add credible intervals to predictions
for (i in idx)
lines(rep(d2$C[i],2),
C_PI[,i],
col="blue")
title("posterior predictions + prediction intervals")
## posterior predictions + credible intervals
plot(mu_mean[idx]~-d2$C[idx],
col="blue", cex = .5,
xlab="Observed child reading",
ylab="Predicted child reading",
ylim = range(C_PI[,idx]))
# regression line for perfect coincidence
# of observed and predicted values
abline( a=0,b=1,lty=2)
# add credible intervals to predictions
for (i in idx)
lines(rep(d2$C[i],2),
mu_PI[,i],
col="blue")
title("posterior predictions + credible intervals")
Here, we see that the model systematically overestimates low values of
child reading ability.
Counterfactual plots tell us about the effect of an intervention, if the causal model is true).
Let’s assume you are thinking about an intervention to improve child reading ability. You know that parental daily reading has a stronger effect than the number of books in the household. On the other hand, it is maybe easier to convince parents to accept some book gifts compared to making them read more. So you want to know by how much you can expect child reading to change if you increase either daily reading hours or number of book in a household.
This can be calculated with counterfactuals, so called because they calculate outcomes under conditions that were actually not observed, which are computed according to following rules.
Here is as a refresher our DAG:
drawdag(dag1, cex = 0.5, lwd = 1)
If we want to manipulate \(\small
D\), we have to calculate the resulting direct effect \(\small D \rightarrow C\) and the indirect
effect \(\small D \rightarrow B \rightarrow
C\). In order to do this, we need a regression coefficients for
all these paths. With quap we can estimate them in one
model:
m.BC <-quap(
alist(
## D->C<-B
C ~ dnorm(mu,sigma),
mu <- a + bB*B + bD*D,
a ~ dnorm(0,0.2),
bD ~ dnorm(0,0.5),
bB ~ dnorm(0,0.5),
sigma ~ dexp(1),
## D -> B
B ~ dnorm(mu_B,sigma_B),
mu_B <- aB + bDB*D,
aB ~ dnorm(0,0.2),
bDB ~ dnorm(0,0.5),
sigma_B ~ dexp(1)),
data=d)
Next we specify a sequence of values for \(\small D\), for which we would like to estimate \(\small C\):
D_seq <-seq(from=-2,to=2,length.out=10)
We can use the sim function from the
rethinking package to simulate first values of \(\small D\) and then, using the just
simulate \(\small D\) values, also
\(\small C\) values.
The
varsargument tosimtells it both which observables to simulate and in which order.
# prepare the data
sim_dat = data.frame(D = D_seq)
# simulate B and then C,using D_seq
s = rethinking::sim(m.BC,
data = sim_dat,
vars = c("B", "C"))
plot(sim_dat$D,colMeans(s$C),
type="l", col = "blue",
xlab="manipulated D",
ylab="counterfactual C")
shade(apply(s$C,2,PI),
sim_dat$D,
col = adjustcolor("blue",alpha = .2))
mtext( "Total counterfactual effect of D on C")
This is a large effect. Next we calculate by how many sd child reading gets better if we go from increase daily reading of parents from 0 to 1 (which is also a 1 sd increase because the sd of parental reading is 1.).
sim2_dat = data.frame(D = c(0,1))
s2 = rethinking::sim(m.BC,
data=sim2_dat,
vars=c("B","C"))
mean( s2$C[,2]-s2$C[,1])
## [1] 1.149065
Now we look at counterfactuals for \(\small B\). We calculate the effect of \(\small B\) while we fix \(\small D\) to zero. 4
sim_datB = data.frame(
B=seq(from=-2,to=2,length.out=10),
D=0)
sB = rethinking::sim(m.BC,
data=sim_datB,
vars="C")
And we plot the simulated data, with the counterfactual effect of \(\small D\) on the left side and of \(\small B\) on the right side.
par(mfrow = c(1,2))
plot(sim_dat$D,colMeans(s$C),
ylim=c(-2.5,2.5), type="l", col = "blue",
xlab="manipulated D",
ylab="counterfactual C")
shade(apply(s$C,2,PI),
sim_dat$D,
col = adjustcolor("blue",alpha = .2))
mtext( "Total counterfactual effect of D on C")
plot(sim_datB$B,colMeans(sB),
ylim=c(-2.5,2.5), type="l", col = "blue",
xlab="manipulated B",
ylab="counterfactual C")
shade( apply(sB,2,PI),
sim_datB$B,
col = adjustcolor("blue",alpha = .2))
mtext( "Total counterfactual effect of B on C")
As expected, the effect of \(\small B\) is much weaker.
simB2_dat = data.frame(B=c(0,1), D = 0)
s2 = rethinking::sim(m.BC,
data=simB2_dat,
vars=c("C"))
mean(s2[,2]-s2[,1])
## [1] 0.1772927
Write down a multiple regression to evaluate the claim: Neither amount of funding nor size of laboratory is by itself a good predictor of time to PhD degree; but together these variables are both positively associated with time to degree. Write down the model definition and indicate which side of zero each slope parameter should be on.
Fu = rnorm(100)
Si = scale(-1*(Fu + rnorm(100)))
Ti = 0.3*Fu + 0.3*Si + rnorm(100)
df = data.frame(Fu = Fu, Si = Si, Ti = Ti)
pairs(df)
summary(lm(Ti~Fu))
##
## Call:
## lm(formula = Ti ~ Fu)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.75900 -0.64293 -0.05141 0.52720 2.48079
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.13403 0.09976 1.344 0.182
## Fu -0.09558 0.10072 -0.949 0.345
##
## Residual standard error: 0.9888 on 98 degrees of freedom
## Multiple R-squared: 0.009107, Adjusted R-squared: -0.001004
## F-statistic: 0.9007 on 1 and 98 DF, p-value: 0.3449
summary(lm(Ti~Si))
##
## Call:
## lm(formula = Ti ~ Si)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.68412 -0.64274 -0.01283 0.51408 2.41558
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.12149 0.09749 1.246 0.2157
## Si 0.18919 0.09798 1.931 0.0564 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9749 on 98 degrees of freedom
## Multiple R-squared: 0.03665, Adjusted R-squared: 0.02682
## F-statistic: 3.728 on 1 and 98 DF, p-value: 0.05639
summary(lm(Ti~Si+ Fu))
##
## Call:
## lm(formula = Ti ~ Si + Fu)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.74187 -0.65316 -0.01052 0.47708 2.38115
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.11012 0.09960 1.106 0.2716
## Si 0.25053 0.14115 1.775 0.0791 .
## Fu 0.08662 0.14305 0.606 0.5462
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9781 on 97 degrees of freedom
## Multiple R-squared: 0.04028, Adjusted R-squared: 0.02049
## F-statistic: 2.035 on 2 and 97 DF, p-value: 0.1362
If one has multiple measurements of the same variable over time, each measurement would be a new node↩︎
Note that this is true for the current model. If there would be another confounder, i.e. a common cause for \(\small B\) and \(\small C\), we would have to include it into the regression.↩︎
Table 1 describes the sample↩︎
Fixing \(\small D\) to a specific value is OK here because this is a simple linear regression without interactions. Things get more complicated when interactions play a role or non-linear link functions are used, like e.g. for logistic regressions.↩︎